Extending Value Reuse to Basic Blocks with Compiler Support

نویسندگان

  • Jian Huang
  • David J. Lilja
چکیده

Speculative execution and instruction reuse are two important strategies that have been investigated for improving processor performance. Value prediction at the instruction level has been introduced to allow even more aggressive speculation and reuse than previous techniques. This study suggests that using compiler support to extend value reuse to a coarser granularity than a single instruction, such as a basic block, may have substantial performance benefits. We investigate the input and output values of basic blocks and find that these values can be quite regular and predictable. For the SPEC benchmark programs evaluated, 90% of the basic blocks have fewer than 4 register inputs, 5 live register outputs, 4 memory inputs and 2 memory outputs. About 16% to 41% of all the basic blocks are simply repeating earlier calculations when the programs are compiled with the -O2 optimization level in the GCC compiler. Compiler optimizations, such as loop-unrolling and function inlining, affect the sizes of basic blocks, but have no significant or consistent impact on their value locality, nor the resulting performance. Based on these results, we evaluate the potential benefit of basic block reuse using a novel mechanism called the block history buffer. This mechanism records input and live output values of basic blocks to provide value reuse at the basic block level. Simulation results show that using a reasonably-sized block history buffer to provide basic block reuse in a 4-way issue superscalar processor can improve execution time for the tested SPEC programs by 1% to 14% with an overall average of 9% when using reasonable hardware assumptions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Basic Block Value Locality with Block Reuse

Value prediction at the instruction level has been introduced to allow more aggressive speculation and reuse than previous techniques. We investigate the input and output values of basic blocks and find that these values can be quite regular and predictable, suggesting that using compiler support to extend value prediction and reuse to a coarser granularity may have substantial performance bene...

متن کامل

Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

The fact that instructions in programs often produce repetitive results has motivated researchers to explore various techniques, such as value prediction and value reuse, to exploit this behavior. Value prediction improves the available Instruction-Level Parallelism (ILP) in superscalar processors by allowing dependent instructions to be executed speculatively after predicting the values of the...

متن کامل

Compiler-Assisted Sub-Block Reuse

The fact that instructions in programs often produce repetitive results has motivated researchers to explore various alternatives to exploit this value locality, such as value prediction and value reuse. Value prediction improves the available Instruction-Level Parallelism (ILP) by allowing dependent instructions to be executed speculatively after predicting the values of their operands. Value ...

متن کامل

Improving Processor Performance Through Compiler-Assisted Block Reuse

Superscalar microprocessors currently power the majority of computing machines. These processors are capable of executing multiple independent instructions in each clock cycle by exploiting the Instruction-Level Parallelism (ILP) available in programs. Theoretically, there is a considerable amount of ILP available in most programs. However, the actual amount of exploitable ILP within a fixed in...

متن کامل

Exploiting Data Reuse in Modern Fpgas: Opportunities and Challenges for Compilers

Current high-end Field-Programmable-Gate-Array (FPGA) parts offer a large number of configurable resources. These can be organized in custom storage structures such as tapped-delay lines, in addition to a number of very dense highcapacity Random-Access-Memory (RAM) and Content-Addressable-Memory (CAM) blocks. The extreme flexibility of the size, organization and interconnection between these st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2000